Visual Object Detection with DETR to Support Video-Diagnosis Using Conference Tools

نویسندگان

چکیده

Real-time multilingual phrase detection from/during online video presentations—to support instant remote diagnostics—requires near real-time visual (textual) object and preprocessing for further analysis. Connecting specialists sharing specific ideas is most effective using the native language. The main objective of this paper to analyze propose—through DEtection TRansformer (DETR) models, architectures, hyperparameters—recommendation, procedures with simplified methods achieve reasonable accuracy textual development conference translation based on artificial intelligence supported solutions has a relevant impact in health sector, especially clinical practice via better consultation (VC) or diagnosis. importance was augmented by COVID-19 pandemic. challenge topic connected variety languages dialects that involved speak usually needs human translator proxies which can be substituted AI-enabled technological pipelines. sensitivity element localization directly complexity, quality, collected training data sets. In research, we investigated DETR model several variations. research highlights differences prominent detectors: YOLO4, DETR, Detectron2, brings AI-based novelty collaborative combined OCR. performance evaluated through two phases: 248/512 (Phase1/Phase2) record train set, 55/110 set validated instances 7/10 application categories 3/3 categories, same annotation. achieved score breaks expected values terms text scope, giving high data, mean average precision ranging from 0.4 0.65.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Visual Object Detection using Frequent Pattern Mining

Object search in a visual scene is a highly challenging and computationally intensive task. Most of the current object detection techniques extract features from images for classification. From the results of these techniques it can be observed that the feature extraction approach works well for single images but are not sufficient for generalizing over a variety of object instances of the same...

متن کامل

Object Category Detection Using Audio-Visual Cues

Categorization is one of the fundamental building blocks of cognitive systems. Object categorization has traditionally been addressed in the vision domain, even though cognitive agents are intrinsically multimodal. Indeed, biological systems combine several modalities in order to achieve robust categorization. In this paper we propose a multimodal approach to object category detection, using au...

متن کامل

3D Object Detection with Latent Support Surfaces

We develop a 3D object detection algorithm that uses latent support surfaces to capture contextual relationships in indoor scenes. Existing 3D representations for RGB-D images capture the local shape and appearance of object categories, but have limited power to represent objects with different visual styles. The detection of small objects is also challenging because the search space is very la...

متن کامل

Using P300 to Evaluate the Effect of Object Color Knowledge in Novelty Detection

A B S T R A C T Introduction: In an oddball experiment, the context in which novel stimuli are presented affects characteristics of novelty P3, i.e. as long as there is a difficult task in which the difference between standard and target stimuli is small, recurrent presentation of a highly discrepant stimulus can lead to P300 highly similar to novelty P3. Effect of stimulus properties on P300 h...

متن کامل

Fire detection using video sequences in urban out-door environment

Nowadays automated early warning systems are essential in human life. One of these systems is fire detection which plays an important role in surveillance and security systems because the fire can spread quickly and cause great damage to an area. Traditional fire detection methods usually are based on smoke and temperature detectors (sensors). These methods cannot work properly in large space a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2022

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app12125977